Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Web Audio priming/blessing #3974

Merged
merged 19 commits into from
Jul 7, 2021
Merged

Conversation

compulim
Copy link
Contributor

@compulim compulim commented Jul 6, 2021

Fixes #2316. Fixes #3823. Fixes #3899.

Changelog Entry

Added

  • Resolves #2316. Added blessing/priming of AudioContext when clicking on microphone button, by @compulim, in PR #3974

Fixed

  • Fixes #3823 and #3899. Fix speech recognition and synthesis on Safari, in PR #3974

Description

Instead of using microphone input source from Speech SDK (a.k.a. AudioConfig.fromMicrophone), we are implementing our own audio input source. For a few reasons:

  • In Safari, AudioContext need to be primed/blessed
    • Blessing means we call AudioContext.resume() from code that is initiated from user gestures, such as clicking on the microphone button
  • AudioContext should be reused to keep its blessing status

However, the adapter provided by Speech SDK:

  • Will not keep the instance of AudioContext, always close after use
  • May not call AudioContext.resume() from a user gesture event

This means, in Safari, it may occasionally lose access to microphone.

We also added new internal hook useResumeAudioContext. So we can call the hook to from time to time to bless the AudioContext continuously.

Design

We are based from Speech SDK AudioConfig.fromStreamInput and Push/PullAudioInputStream.

There are few caveats with Push/PullAudioInputStream:

  • PushAudioInputStream
    • We can call write() occasionally to push buffer to it
    • Does not support more than one recognition, will throw an error
    • Does not signal us when the input stream is tearing down by the speech recognition engine (such as, when the recognition is completed)
  • PullAudioInputStream
    • We provide a callback function, which the PullAudioInputStream will call continuously to pull buffer from us
    • It will continuously call the callback function to pull in more data
    • The callback function must be synchronous
    • That means, before the PullAudioInputStream is set up, all data must be ready synchronously
    • Microphone input are not synchronous and data are not ready at t=0

Instead of basing off Push/PullAudioInputStream, we are basing off from their base class AudioInputStream instead. But there are lot of quirks in their original implementation.

  • attach is called, but detach is never called
  • turnOff is called, but turnOn is never called
  • close is marked as abstract (must implement), but never called

We wrapped Speech SDK AudioInputStream implementation, cleaning up quirks, and internally expose as createAudioConfig, which only requires 1 callback:

  • attach, which returns a Promise of:
    • Streams of ArrayBuffer chunks
    • Audio format (e.g. 16-bit mono 16 kHz)
    • Device info (manufacturer, model, connectivity type, device type)
  • turnOff (optional), which will be called before the device is tearing down
    • Does not call when used in Direct Line Speech

We use our createAudioConfig in 2 different areas to proof its efficiency: real microphone input, and mocked input in test harness.

In the future, when we bump Speech SDK, we will need to make sure our createAudioConfig will continue to work with their latest version.

Specific Changes

  • Added new createAudioConfig for implementing custom AudioConfig for Speech SDK without understanding its complexity
  • When using Cognitive Services or Direct Line Speech and audioConfig is not passed, we will use our custom AudioConfig for microphone input
    • When using botframework-directlinespeech-sdk alone without Web Chat, it will NOT use our custom AudioConfig
  • Added new mock microphone input in test harness to use the new createAudioConfig
  • Added internal useResumeAudioContext hook to continuously bless AudioContext object
  • Bless AudioContext object when pointerdown event is received from window object
  • I have added tests and executed them locally
  • I have updated CHANGELOG.md
  • I have updated documentation

Review Checklist

This section is for contributors to review your work.

  • Accessibility reviewed (tab order, content readability, alt text, color contrast)
  • Browser and platform compatibilities reviewed
  • CSS styles reviewed (minimal rules, no z-index)
  • Documents reviewed (docs, samples, live demo)
  • Internationalization reviewed (strings, unit formatting)
  • package.json and package-lock.json reviewed
  • Security reviewed (no data URIs, check for nonce leak)
  • Tests reviewed (coverage, legitimacy)

@compulim compulim marked this pull request as ready for review July 6, 2021 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants